Developing and Evaluating a Probabilistic LR Parser of Part-of-Speech and Punctuation Labels

نویسندگان

  • Ted Briscoe
  • John A. Carroll
چکیده

We describe an approach to robust domain-independent syntactic parsing of unrestricted naturally-occurring (English) input. The technique involves parsing sequences of part-ofspeech and punctuation labels using a unification-based grammar coupled with a probabilistic LR parser. We describe the coverage of several corpora using this grammar and report the results of a parsing experiment using probabilities derived from bracketed training data. We report the first substantial experiments to assess the contribution of punctuation to deriving an accurate syntactic analysis, by parsing identical texts both with and without naturally-occurring punctuation marks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Studying impressive parameters on the performance of Persian probabilistic context free grammar parser

In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...

متن کامل

Apportioning Development Effort in a Probabilistic LR Parsing System Through Evaluation

We describe an implemented system for robust domain-independent syntactic parsing of English, using a unification-based grammar of part-ofspeech and punctuation labels coupled with a probabilistic LR parser. We present evaluations of the system’s performance along several different dimensions; these enable us to assess the contribution that each individual part is making to the success of the s...

متن کامل

Fast LR parsing Using Rich (Tree Adjoining) Grammars

We describe an LR parser of parts-ofspeech (and punctuation labels) for Tree Adjoining Grammars (TAGs), that solves table conflicts in a greedy way, with limited amount of backtracking. We evaluate the parser using the Penn Treebank showing that the method yield very fast parsers with at least reasonable accuracy, confirming the intuition that LR parsing benefits from the use of rich grammars.

متن کامل

A Context-Sensitive Model for Probabilistic LR Parsing of Spoken Language with Transformation-Based Postprocessing

This paper describes a hybrid approach to spontaneous speech parsing. The implemented parser uses an extended probabilistic LR parsing model with rich context and its output is postprocessed by a symbolic tree transformation routine that tries to eliminate systematic errors of the parser. The parser has been trained for three different languages and was successfully integrated in the Verbmobil ...

متن کامل

A generalized LR parser for text-to-speech synthesis

The development of a parser for a Norwegian text-to-speech system is reported. The Generalized Left Right (GLR) algorithm [1] is applied, which is a generalization of the well known LR algorithm for parsing computer languages. This paper describes briefly the GLR algorithm, the integration of a probabilistic scoring model, our implementation of the parser in C++, attribute structures, lexical i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/cmp-lg/9510005  شماره 

صفحات  -

تاریخ انتشار 1995